Back

npj Digital Medicine

85 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
AI-generated data contamination erodes pathological variability and diagnostic reliability
2026-01-22 health informatics 10.64898/2026.01.19.26344383
#1 (30.6%)
Show abstract

Generative artificial intelligence (AI) is rapidly populating medical records with synthetic or partially AI-generated content, creating a feedback loop where future models are increasingly at risk of training on uncurated AI-generated data. However, the clinical consequences of this AI-generated data contamination remain unexplored. Here, we show that in the absence of mandatory human verification, this self-referential cycle drives a rapid erosion of pathological variability and diagnostic rel...

2
Evaluating the AI Potential as a Safety Net for Diagnosis: A Novel Benchmark of Large Language Models in Correcting Diagnostic Errors
2026-02-24 health systems and quality improvement 10.64898/2026.02.22.26346832
#1 (25.6%)
Show abstract

BackgroundDiagnostic errors are a leading cause of preventable patient harm, often occurring during early clinical encounters where diagnostic uncertainty is maximal. Large language models (LLMs) have shown potential in medical reasoning, yet their ability to function as a diagnostic safety net, specifically by identifying and correcting human diagnostic errors, remains systematically unquantified. We evaluated whether state-of-the-art LLMs can effectively challenge, rather than merely confirm, ...

3
PaiX Net: A Next-Generation Second-Opinion Platform for Pathology
2026-02-09 pathology 10.64898/2026.02.04.26345344
#1 (24.7%)
Show abstract

Pathology faces persistent challenges including a global shortage of specialists, uneven access to expertise, increasing diagnostic complexity, and a growing need for second-opinion consultations. While digital and telepathology platforms address parts of this problem, existing solutions often trade accessibility for structured, workflow-aware clinical integration. At the same time, multimodal medical AI shows promise for diagnostic support but raises concerns regarding transparency, automation ...

4
A Mobile AI-enhanced Platform for Standardized Wound Assessment and Clinical Decision Support
2026-01-23 dermatology 10.64898/2026.01.22.26344407
#1 (24.4%)
Show abstract

Chronic wounds affect over 1.2 million Canadians and incur healthcare costs exceeding $13 billion annually, with global expenditures approaching $149 billion. Current clinical practice relies on manual measurements and subjective visual evaluations, which overestimate wound area by up to 40% and demonstrate poor-to-moderate inter-rater reliability. This variability complicates longitudinal monitoring and evidence-based treatment selection. We developed and evaluated an integrated mobile platform...

5
Computer Vision-Based Retrieval of Steps and Errors in Laparoscopic Cholecystectomy
#1 (24.2%)
Show abstract

Traditional surgical training relies heavily on hands-on experiences gained through relatively infrequent procedures during apprenticeships. Recently, postoperative review has become a valuable supplement to this model, offering learning opportunities outside the operating room. However, its adoption remains limited due to its inefficiencies. In this study, we developed a Computer Vision-based system designed to efficiently navigate and retrieve critical segments from laparoscopic cholecystectom...

6
MedOS: AI-XR-Cobot World Model for Clinical Perception and Action
2026-02-23 health informatics 10.64898/2026.02.18.26345936
#1 (24.1%)
Show abstract

Medicine historically separates abstract clinical reasoning from physical intervention. We bridge this divide with MedOS, a general-purpose embodied world model. Mimicking human cognition via a dual-system architecture, MedOS demonstrates superior reasoning on biomedical benchmarks and autonomously executes complex clinical research. To extend this intelligence physically, the system simulates medical procedures as a physics-aware model to foresee adverse events. Generating and validating on the...

7
Automated Assessment of OSCE Physical Exams using Multimodal AI
2026-01-18 medical education 10.64898/2026.01.09.26343786
#1 (24.1%)
Show abstract

BackgroundThe assessment of physical examination skills in medical education is resource-intensive and prone to inter-rater variability. While artificial intelligence (AI) has successfully automated the grading of clinical notes and transcripts, evaluating the physical techniques themselves--what students do rather than what they say--remains an unsolved challenge. We evaluated whether a multimodal AI system could assess physical examination skills with expert-level reliability. MethodsIn this ...

8
Red-Teaming Medical AI: Systematic Adversarial Evaluation of LLM Safety Guardrails in Clinical Contexts
2026-03-05 health informatics 10.64898/2026.02.26.26347212
#1 (24.1%)
Show abstract

BackgroundLarge language models (LLMs) are increasingly deployed in medical contexts as patient-facing assistants, providing medication information, symptom triage, and health guidance. Understanding their robustness to adversarial inputs is critical for patient safety, as even a single safety failure can lead to adverse outcomes including severe harm or death. ObjectiveTo systematically evaluate the safety guardrails of state-of-the-art LLMs through adversarial red-teaming specifically designe...

9
Synergistic barriers to algorithmic recourse in healthcare and administrative systems
2026-02-26 health systems and quality improvement 10.64898/2026.02.22.26346836
#1 (24.0%)
Show abstract

Algorithmic decision systems mediate access to healthcare, credit, employment and housing, yet individuals who experience adverse decisions face multi-stage barriers when seeking recourse. We formalize these barriers as a series-structured system with 11 empirically parameterized stages across three layers (data integration, data accuracy and institutional access) and prove that single-barrier interventions are bounded by baseline system success. Under baseline parameterization derived from fede...

10
Ed-Triage-Agent: A Framework For Human-Ai Collaborative Emergency Triage
2026-02-18 health informatics 10.64898/2026.02.17.26346501
#1 (23.9%)
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWEmergency Department triage is a critical decision-making process in which clinicians must rapidly assess patient acuity under high cognitive load and time pressure. We present ED-Triage-Agent (ETA), a multi-agent AI framework designed to augment clinical decision-making in Emergency Severity Index (ESI) classification through human-AI collaboration. The system operates in two phases: (1) autonomous patient intake via a conversational agent that collects structured sympto...

11
Population differences in wearable device wear time: Rescuing data to address biases and advance health equity
2026-03-06 health informatics 10.64898/2026.03.06.26347799
#1 (23.6%)
Show abstract

Wearable devices present transformative opportunities for personalized healthcare through continuous monitoring of digital biomarkers; however, individual variations in device wear time could mask or otherwise impact signal identification. Despite the widespread adoption of wearable devices in research, no comprehensive framework exists for understanding how wear time varies across populations or for addressing wear time-related biases in analysis. Using Fitbit data from 11,901 participants in t...

12
MedPI: Evaluating AI Systems in Medical Patient-facing Interactions
2026-01-01 health informatics 10.64898/2025.12.24.25342982
#1 (22.8%)
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWWe present MO_SCPLOWEDC_SCPLOWPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn question-answer (QA) benchmarks, MO_SCPLOWEDC_SCPLOWPI evaluates the medical dialogue across 105 dimensions comprising the medical process, treatment safety, treatment outcomes and doctor-patient communication across a granular, accreditation-aligned rubric. MO_SCPLOWEDC_SCPLOWPI comprises five layers: (1) PO_SCP...

13
Trilingual (EN/ZH-CN/JP) synthetic dataset of cerebral infarction patient-nurse bedside dialogs with metadata
#1 (22.8%)
Show abstract

We propose a large-scale synthetic dataset that correlates structured background information aligned with the actual distribution of patients with cerebral infarction, nurse characteristics, and nurse-patient dialogues across diverse scenarios. Medical dialogue corpora are scarce due to privacy and access restrictions. Even when available, they primarily focus on physician-patient interactions and offer limited metadata (clinical covariates, staff characteristics, etc.). To address this gap, thi...

14
Socially Grounded Exemplars Improve Synthetic Conversations for Health-Related Social Needs Navigation
2026-02-02 health informatics 10.64898/2026.01.30.26345239
#1 (22.8%)
Show abstract

Health-Related Social Needs (HRSNs) significantly impact health outcomes, yet traditional care often fails to address them effectively. While conversational agents offer scalable support, their deployment is hindered by privacy risks and a lack of specialized training data for clinical applications. Synthetic data generation offers a solution to address this gap; standard pipelines often prompt LLMs using structured user personas, comprising demographics, constraints, and goals, to emulate dialo...

15
Representation Before Retrieval: Structured Patient Artifacts Reduce Hallucination in Clinical AI Systems
2026-02-16 health informatics 10.64898/2026.02.13.26346256
#1 (22.6%)
Show abstract

BackgroundLarge language models show promise for clinical decision support, yet their propensity for hallucination--generating plausible but unsupported claims--poses sub-stantial patient safety risks. Retrieval-augmented generation (RAG) is widely assumed to mitigate this problem by grounding outputs in retrieved documents, but this assumption remains inadequately tested in clinical contexts where information density, temporal complexity, and safety stakes are uniquely high. MethodsWe develope...

16
Developing and Testing an Engineering Framework for Curiosity-Driven and Humble AI in Clinical Decision Support
2026-02-07 health informatics 10.64898/2026.02.06.26345664
#1 (22.5%)
Show abstract

BackgroundWe present BODHI (Balanced, Open-minded, Diagnostic, Humble, and Inquisitive), an engineering framework for curiosity-driven and humble clinical decision support AI. Despite growing capabilities, large language models (LLMs) often express inappropriate confidence, conflating statistical pattern recognition with genuine medical understanding. BODHI addresses this through a dual-reflective architecture that: (1) decomposes epistemic uncertainty into task-specific dimensions, and (2) cons...

17
Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public
2025-12-29 health informatics 10.64898/2025.12.19.25342205
Top 0.1% (22.5%)
Show abstract

Artificial intelligence (AI) is increasingly permeating healthcare, from physician assistants to consumer applications. Since AI algorithms opacity challenges human interaction, explainable AI (XAI) addresses this by providing AI decision-making insight, but evidence suggests XAI can paradoxically induce over-reliance or bias. We present results from two large-scale experiments (623 lay people; 153 primary care physicians, PCPs) combining a fairness-based diagnosis AI model and different XAI exp...

18
Multi-Model Clinical Validation of an AI-Powered Biomarker Analysis Framework: A Cross-Vendor Benchmark on 4,018 NHANES Patients
2026-02-17 health informatics 10.64898/2026.02.13.26346284
Top 0.1% (22.4%)
Show abstract

BackgroundLarge language models (LLMs) show promise for clinical decision support, yet most validation studies evaluate single models, leaving questions about generalizability and vendor dependence unanswered. We assessed whether a standardized biomarker analysis framework maintains clinical-grade accuracy across multiple LLMs from independent providers. MethodsWe developed a structured prompt-based framework for detecting eight clinical patterns (insulin resistance, diabetes, cardiovascular di...

19
Development and retrospective validation of SCOUT: scalable clinical oversight of large language models via uncertainty triangulation
2026-02-10 cardiovascular medicine 10.64898/2026.02.08.26345860
Top 0.1% (22.3%)
Show abstract

Large language models (LLMs) are increasingly used in clinical workflows, yet requiring clinician review of every AI output negates the efficiency gains that motivate their adoption. We present SCOUT (Scalable Clinical Oversight via Uncertainty Triangulation), a model-agnostic meta-verification framework that selectively defers unreliable LLM predictions to clinicians by triangulating three orthogonal signals: model heterogeneity, stochastic inconsistency, and reasoning critique. In this retrosp...

20
Personalized Insights Derived from Wearable Device Data and Large Language Models to Improve Well-Being
2026-03-04 health informatics 10.64898/2026.03.03.26347299
Top 0.1% (22.1%)
Show abstract

Health behaviors such as physical activity and sleep affect mental health, but the effect of each health behavior varies substantially across individuals, limiting the usefulness of generic behavioral recommendations. We collected one year of continuous wearable and ecological momentary assessment data from 3,139 participants in the Intern Health Study (2018-2023), and examined individual-level associations between wearable-derived features and mood across the internship year. The behaviors asso...